Software support for practical grammar development

نویسندگان

  • Branimir Boguraev
  • John A. Carroll
  • Ted Briscoe
  • Claire Grover
چکیده

Even though progress in theoretical linguistics does not necessarily rely on the construction of working programs, a large proportion of current research in syntactic theory is facilitated by suitable computational tools. However, when natural language processing applications seek to draw on the results from new developments in theories of grammar, not only the nature of the tools has to change, but they face the challenge of reconciling the seemingly contradictory requirements of notational perspicuity and efficiency of performance. In this paper, we present a comparison and an evaluation of a number of software systems for grammar development, and argue that they are inadequate as practical tools for building wide-coverage grammars. We discuss a number of factors characteristic of this task, demonstrate how they influence the design of a suitable software environment, and describe the implementation of a system which has supported efficient development of a large computational grammar of English? 1. Tools for Grammar Development A number of researzh projects within the broad area of natural language processing (NLP) and theoretical linguistics make use of special purpose programs, which are beginning to be known under the general term of "gm.nmar development environments" (GDEs). Particularly well known examples are reported in Kaplan (1983) (see also Kiparsky, 1985), Shieber (1984), Evans (1985), Phillips and Thompson (1985), Jensen et al. (1986) and Karttunen (1986). In all instances the software packages cited above fall in the class of computational tools used in theoretical (rather than applied) Projects. Thus Kaplan's Grammar-writer's Workbench is an implementation of a particular linguistic theory (Lexical Functional Grammar;, Kaplan and Bresnan, 1982); similarly, Evans' ProGram incorporated an early version of Generalized Phrase Structure Grammar (GPSG, Gazdar and Pullum, 1982), whilst PATR-II is a "virtual linguistic machine", developed by Shieber as a tool for experimenting with a variety of syntactic theories. These systems differ in their goals. Particular implementations of a theory may be used for observing how theory-internal devices interact with each other, or to maintain internal consistency as the grammar is being developed. On the other hand, formalisms for encoding linguistic information in a uniform way underpin effm~s to compare and evaluate alternative linguistic theories (Shieber, 1987). Neither type O f system is adequate to the task of grammar development on a large scale or for incorporating such a grammar into a practical NLP system, due to factors such as efficiency of encoding (largely neglected in such systems) or verbosity and redundancy of the formal notation. Within the frameworks of their aecomodating projects, these are in no way inadequacies of the computational tools; still, the applicability of the tools remains limited outside the strictly theoretical concem. developed at Yorktown Heights (Jensen et al., 1986). Both are capable of impressive coverage and this is, to some extent, due to the more flexible formalisms employed. A common feattne of these formalisms is that they all fall prey to what Kaplan 0987) refers to as "the procedural seduction" of computational linguistics: whatever the basis for the notation is, it incorporates a handle for explicit intervention into the interpretation of the grammar at hand. Sometimes the nature of the task for which the g~ammar is being developed justifies a form~J notation incolporating 'hooks' for explicit procedures. Thus a number of matchine translation (MT) projects~ especially ones employing a ~ransfer strategy, make use of format systems for grammar specification, which, in addition to mapping surface strings into con~esponding language structures, identify operations to be associated with nodes and / or subtrees (Vauquois & Boitet, 1985; Nagao et al., 1985). In general, the effects of the temptation to allow, for example, the EVALuation of arbitrary LISP expressions on the ares of the ATN or the addition of "procedural programming facilities" to the rule-based skeleton of 1BM's PLNLP have been discussed at length in the recent literature addressing the issues of declarative formalisms from a theoretical perspective (see Shieber, 1986a, and references therein). However, from the point of view of developing a realistic grammar with substantial coverage, the opening of the procedural 'back door', while perhaps useful fo: 'patching' the inadequacies in the linguistic theory during the exercise, can turn the whole process of grammar development and maintenance into an orgea~isational nightmare, as side effects accumulate and ripple effects propagate. A ~parate problem with allowing procedural attachment into the grammar formalism stems from the inevitable commitment to a particular version of a particular theory. Even wben a deliberate effort is made to develop a flexible and general framework capable of accomodating a range of 'underlying' linguistic operations, such a framework is bound eventually to become inadequate, especially as modem theories of grammar (strive to) become more declarative and tend to make reference to larger bodies of knowledge. A case ha point is the ARIANE system (Vauquois & Boitet, 1985): even though it was designed as a completely integrated programmaing environment, with the aim of enabling implementation of, and experimentation with, different linguistic theories, in reality the system has been unable to cope with radically new grammatical frameworks and computational strategies for text analysis. The question then arises of the optimal way of developing a practical grammar. This paper will report on our experience in building such a grammar, with a particular emphasis on how a number of constraining factors have influenced the design and implementation of the software tools for supporting the linguist's work. 2. Design Considerations On the other hand, a number of syntactic formalisms have been used to develop wide-coverage grammars for use in practical NLP systems. The best known of these is the Augmented Transition Network formalism due to Woods (1970). More recent examples are the DIAGRAM grammar (Robinson, 1982) of SRI's TEAM natural language interface (Grosz et al., 1987) and the PEG grammar Currently with the IBM (UK) Science Centre. The work described here was supported by research grant GR/D/87321 from the UK Science and Engineering Research Council. For the last two yearn we have been engaged in a project aimed at substantial grammar development, as part of a larger effort to produce an integrated system for wide-coverage morphological and syntactic analysis of English. The overall objectives of tile combined effort arc described in a number of papers (see Russell et at., 1986; Phillips arid Thompson, 1987, and Briscoe et al., 1987). We aimed to achieve comprehensive coverage of English in two years, using only one linguist and one programmer full-time; the complete natural language toolkit was to be made available to the research community outside the immediate enviromnent whetx~ the grammar was being developed. Consequently, the software support for the linguist had to exhibit a number of characteristics to encourage high productivity, Particularly

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Distributed Grammar Engineering For Practical Applications

Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a...

متن کامل

Declarative Semantics in Object-Oriented Software Development - A Taxonomy and Survey

One of the modern paradigms to develop an application is object oriented analysis and design. In this paradigm, there are several objects and each object plays some specific roles in applications. In an application, we must distinguish between procedural semantics and declarative semantics for their implementation in a specific programming language. For the procedural semantics, we can write a ...

متن کامل

Proposing an Appropriate Architecture for Decision Support Systems in the Field of Complex Chronic Care: Micro-Services Based Software Architecture in Kidney Transplant Care

Introduction: Development and successfully implementation of knowledge based clinical decision support system (KBCDSS) in kidney transplantation (KT) could support decision-making, reduce cost and improve quality of care. For practical use of these systems, however, many challenges have to be met.  Besides to well-recognized challenges of design and implementation of information systems in heal...

متن کامل

Proposing an Appropriate Architecture for Decision Support Systems in the Field of Complex Chronic Care: Micro-Services Based Software Architecture in Kidney Transplant Care

Introduction: Development and successfully implementation of knowledge based clinical decision support system (KBCDSS) in kidney transplantation (KT) could support decision-making, reduce cost and improve quality of care. For practical use of these systems, however, many challenges have to be met.  Besides to well-recognized challenges of design and implementation of information systems in heal...

متن کامل

The Grammar Deployment Kit - System Demonstration

Grammar deployment is the process of turning a given grammar specification into a working parser. The Grammar Deployment Kit (for short, GDK) provides tool support in this process based on grammar engineering methods. We are mainly interested in the deployment of grammars for software renovation tools, that is, tools for software reand reverse engineering. The current version of GDK is optimize...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988